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(54) Title: OPTIMIZATION OF NETWORK PROTOCOL OPTIONS BY REINFORCEMENT LEARNING AND PROPAGA 
TION 



A learning component of a i h i K server interacts with clients 
and the environment by conducting different trfals of various 
options in differant states 






Learning component of t 
performance feedback fc 


he TfTP sender receives 
>r these triaJs as rewards 
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(57) Abstract: In one embodiment, a method for optimiza- 
tion of network protocol options with reinforcement learning 
and propagation is disclosed. The method comprises: inter- 
acting, by a learning component of a server of a network, 
with one or more clients and an environment of the netwoik; 
conducting, by the learning component, different trials of one 
or more options in different states for network communica- 
tion via a protocol of the network; receiving, by the learning 
component, performance feedback for the different trials as 
rewards; and utilizing, by the learning component, the dif- 
ferent trials and associated resulting rewards to improve a 
decision- making policy associated with the server for nego- 
tiation of the one or more options. Other embodiments are 
also described. 
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The learning component of the TFTP server utilizes the past 
trials and resulting lewards to improve its dedsion-maldng 
policy for option negotiation 
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ins learnea policies tor various opoon implementation 
decisions are uploaded, along with the ot>8erved 
configurations of the environment to a centralized place 
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Other TFTP servers download the resources and use the 
policy of the most similar environment as the initial point to 
, start a new learning process in their environments 
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